Exploratory Data Analysis

Our exploratory data analysis examines patterns that inform both research questions about usage context and feature adoption. We organize our exploration into four main categories:

Blocked Website Patterns

Figure 5: Website Usage Analysis - Distribution of blocked websites by category (top) and frequency of individual websites (bottom)

The analysis of blocked websites reveals distinct patterns in how users interact with the Jargon extension. Professional tools—particularly Salesforce and AI platforms—are the most frequently blocked, suggesting that users tend to avoid using Jargon during work-related activities. The presence of development environment blocks indicates that some users are technical professionals, though this group represents only a modest portion of the overall user base. Educational content also features prominently among blocked websites, with users often disabling the extension on documentation sites and learning platforms, possibly to maintain focus during concentrated study sessions.

However, it is important to note that there are only 27 blocked sites across 92 users. This limited usage suggests that the blocking feature is not widely utilized, and the current data may not be conclusive. Caution should be exercised when generalizing these findings, as they may not fully represent the broader user population.

Language Mode Usage

Figure 6: Scatter plot showing the relationship between user adoption and question generation across different language modes

The scatter plot highlights key patterns in language mode usage: - Spanish is the most active mode, with the highest number of questions (~800) and users (~30). - GlizzyTalk and Tamil show moderate engagement (~300 questions each). - Korean and GRE Vocabulary form a middle tier (~200 questions). - Most other languages have low adoption, with fewer users and questions. - Some modes (e.g., Tamil) have high question counts despite fewer users, indicating intensive use by dedicated learners.

Overall, while usage intensity and adoption vary widely across languages, traditional language learning modes drive most activity.

Words Frequency Analysis

Figure 7: Word frequency analysis showing common words (top) and word pairs (bottom) in learning content.

Insights from Word and Phrase Frequency Analysis (based on the English original sentences selected for content generation):

  • The most common words and word pairs (e.g., “currents,” “ice,” “churn,” “concentric,” “ice form,” “churn water”) suggest that users frequently select technical or scientific content for practice, possibly from educational or informational sources. Descriptive and Process-Oriented Language:
  • Many frequent terms describe physical processes or states (e.g., “breeze,” “rolls,” “floating ball,” “gentle churn”), indicating an emphasis on dynamic or descriptive language in the learning material.
  • The recurrence of similar words and phrases (e.g., “form,” “water”) implies that certain concepts or topics are repeatedly practiced, which may reflect user interests or the nature of the source material.

Overall, the word frequency analysis reveals that users are engaging most with scientific and descriptive content, focusing on process-oriented vocabulary and recurring technical terms.

Temporal Patterns

Daily Activity

Figure 8: Daily activity patterns showing question generation and active users with their respective averages (red dashed lines) over the observation period, based on UTC timezone.

Weekly Activity

Figure 9: Weekly activity patterns showing average questions generated and active users by day of week (UTC timezone), with error bars indicating standard error and overall averages shown as red dashed lines.

The temporal analysis reveals several key patterns in user engagement, based on both daily and weekly activity (all timestamps in UTC):

  • Daily Trends:: Question generation and active user counts fluctuate considerably day-to-day, with occasional spikes (up to 200 questions or 12 users), but most days remain below the average (12.5 questions, 2.2 users).This indicates a small but steady user base, with 1–5 active users on most days.

  • Weekly Trends: Question generation is highest on Mondays, Tuesdays, and Wednesdays, then tapers off toward the weekend,suggesting users are more engaged during the workweek. There is substantial variability across days, as shown by the error bars.

Together, these patterns indicate that Jargon’s usage is characterized by low but regular engagement, with activity peaking midweek and significant day-to-day variability. This suggests a core group of users who interact with the platform most during the workweek.

User Engagement Distribution

Figure 10: Distribution of key engagement metrics across users, showing individual violin plots for each metric with median and interquartile range (IQR) statistics. Each plot uses a distinct color and includes summary statistics.

The violin plots provide a clearer view of the distribution of user engagement metrics:

  • Generated Questions & Answered Questions: Most users generate and answer only a small number of questions, as shown by the wide base near zero. A few users are highly active, producing a long tail of outliers with much higher counts.
  • Blocked Sites: The vast majority of users do not block any sites (distribution concentrated at zero), with only a handful blocking more than one site.
  • Levels Attempted: Most users attempt only one level, with very few exploring multiple levels. The distribution is sharply peaked at one, with a small tail for higher values.

Overall, the violin plots highlight that engagement is highly skewed: most users interact minimally, while a small subset are much more active or exploratory. This pattern is consistent across all four metrics.